Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks

نویسندگان

  • Justin Werfel
  • Xiaohui Xie
  • H. Sebastian Seung
چکیده

Gradient-following learning methods can encounter problems of implementation in many applications, and stochastic variants are sometimes used to overcome these difficulties. We analyze three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. Learning speed is defined as the rate of exponential decay in the learning curves. When the scalar parameter that controls the size of weight updates is chosen to maximize learning speed, node perturbation is slower than direct gradient descent by a factor equal to the number of output units; weight perturbation is slower still by an additional factor equal to the number of input units. Parallel perturbation allows faster learning than sequential perturbation, by a factor that does not depend on network size. We also characterize how uncertainty in quantities used in the stochastic updates affects the learning curves. This study suggests that in practice, weight perturbation may be slow for large networks, and node perturbation can have performance comparable to that of direct gradient descent when there are few output units. However, these statements depend on the specifics of the learning problem, such as the input distribution and the target function, and are not universally applicable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...

متن کامل

Quality Estimation from Scratch

This thesis presents a deep neural network for word-level machine translation quality estimation. The model extends the feedforward multi-layer architecture by [Collobert et al., 2011] to learning continuous space representations for bilingual contexts from scratch. By means of stochastic gradient descent and backpropagation of errors, the model is trained for binary classification of translate...

متن کامل

T He O Uter P Roduct S Tructure of N Eural N Et - Work D Erivatives

Training methods for neural networks are primarily variants on stochastic gradient descent. Techniques that use (approximate) second-order information are rarely used because of the computational cost and noise associated with those approaches in deep learning contexts. We can show that feedforward and recurrent neural networks exhibit an outer product derivative structure but that convolutiona...

متن کامل

Feedforward Neural Networks

Here x is an input, y is a “label”, v ∈ Rd is a parameter vector, and f(x, y) ∈ Rd is a feature vector that corresponds to a representation of the pair (x, y). Log-linear models have the advantage that the feature vector f(x, y) can include essentially any features of the pair (x, y). However, these features are generally designed by hand, and in practice this is a limitation. It can be laborio...

متن کامل

Feedforward Neural Networks in Reinforcement Learning Applied to High-Dimensional Motor Control

Local linear function approximators are often preferred to feedforward neural networks to estimate value functions in reinforcement learning. Still, motor tasks usually solved by this kind of methods have a low-dimensional state space. This article demonstrates that feedforward neural networks can be applied successfully to high-dimensional problems. The main difficulties of using backpropagati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neural computation

دوره 17 12  شماره 

صفحات  -

تاریخ انتشار 2003